51 research outputs found

    A comparative study between rough and decision tree classifiers

    Get PDF
    Rule-based classification system (RBC) has been widely used in many real world applications because of the easy interpretability of rules.RBC mines a collection of rule via knowledge which is hidden in dataset in order to accurately map new cases to the decision class.In the real world, the number of attribute of dataset could be very large due the capability of database technology to store much information.Following that, the large dataset may contain thousands of relationship and it will likely provide more knowledge since the interrelationship between data will give more description.Furthermore, it is also have the possibility to have most number of rules that contain unnecessary rule or redundancies in the model. Theoretically, a good set of knowledge should provide good accuracy when dealing with new cases.Besides accuracy, a good rule set must also has a minimum number of rules and each rule should be short as possible.It is often that a rule set contains smaller quantity of rules but they usually have more conditions.An ideal model should be able to produces fewer, shorter rule and classify new data with good accuracy.Consequently, the quality and compact knowledge will contribute manager with a good decision model.Because of that, the search for appropriate data mining approach which can provide quality knowledge is important.Rough classifier (RC) and decision tree classifier (DTC) are categorized as RBC.The purpose of this study is to investigate the capability of RC and DTC in generating quality knowledge which leads to the good accuracy.To achieve that, both classifiers are compared based on four measurements that are accuracy of the classification, the number of rule, the length of rule, and the coverage of rule.Five dataset from UCI Machine Learning namely United States Congressional Voting Records, Credit Approval, Wisconsin Diagnostic Breast Cancer, Pima Indians Diabetes Database, and Vehicle Silhouettes are chosen as data experiment.All datasets were mined using RC toolkit namely ROSETTA while C4.5 algorithm in WEKA application was chosen as DTC rule generator.The experimental results indicated that both classifiers produced good classification result and had generated quality rule in different types of model – higher accuracy, fewer rule, shorter rule, and higher coverage.In term of accuracy, RC obtained higher accuracy in average while DTC significantly generated lower number of rule than RC.In term of rule length, RC produced compact and shorter rule than DTC and the length is not significantly different.Meanwhile, RC has better coverage than DTC.Final conclusion can be decided as follows “If the user interested at a variety of rule pattern with a good accuracy and the number of rule is not important, RC is the best solution whereas if the user looks for fewer nr, DTC might be the best choice

    Univariate Financial Time Series Prediction using Clonal Selection Algorithm

    Get PDF
    The ability to predict the financial market is beneficial not only to the individual but also to the organization and country. It is not only beneficial in terms of financial but also in terms of making a short-term and long-term decision. This paper presents an experimental study to perform univariate financial time series prediction using a clonal selection algorithm (CSA). CSA is an optimization algorithm that is based on clonal selection theory. It is a subset of the artificial immune system, a class of evolutionary algorithms inspired by the immune system of a vertebrate. Since CSA is an optimization algorithm, the univariate financial time series prediction problem was modeled into an optimization problem using a weighted regression model. CSA was used to search for the optimal set of weights for the regression model to generate prediction with the lowest error. Three data sets from the financial market were chosen for the experiments of this study namely S&P500 price, Gold price, and EUR-USD exchange rate. The performance of CSA is measured using RMSE. The value of RMSE for a problem is related to the maximum and minimum value of the data set. Therefore, the results were not compared to other data sets. Instead, it is compared to the range of values of the data sets. The result of the experiments shows that CSA can make decent predictions for financial time series despite being inferior to ARIMA. Hence, this finding implies that CSA can be implemented on a univariate financial time series prediction problem given that the problem is modeled as an optimization problem

    An improved artificial dendrite cell algorithm for abnormal signal detection

    Get PDF
    In dendrite cell algorithm (DCA), the abnormality of a data point is determined by comparing the multi-context antigen value (MCAV) with anomaly threshold. The limitation of the existing threshold is that the value needs to be determined before mining based on previous information and the existing MCAV is inefficient when exposed to extreme values. This causes the DCA fails to detect new data points if the pattern has distinct behavior from previous information and affects detection accuracy. This paper proposed an improved anomaly threshold solution for DCA using the statistical cumulative sum (CUSUM) with the aim to improve its detection capability. In the proposed approach, the MCAV were normalized with upper CUSUM and the new anomaly threshold was calculated during run time by considering the acceptance value and min MCAV. From the experiments towards 12 benchmark and two outbreak datasets, the improved DCA is proven to have a better detection result than its previous version in terms of sensitivity, specificity, false detection rate and accuracy

    Determining the impact of window length on time series forecasting using deep learning

    Get PDF
    Time series forecasting is a method of predicting the future based on previous observations. It depends on the values of the same variable, but at different time periods. To date, various models have been used in stock market time series forecasting, in particular using deep learning models. However, existing implementations of the models did not determine the suitable number of previous observations, that is the window length. Hence, this study investigates the impact of window length of long short-term memory model in forecasting stock market price. The forecasting is performed on S&P500 daily closing price data set. A different window length of 25-day, 50-day, and 100-day were tested on the same model and data set. The result of the experiment shows that different window length produced different forecasting accuracy. In the employed dataset, it is best to utilize 100 as the window length in forecasting the stock market price. Such a finding indicates the importance of determining the suitable window length for the problem in-hand as there is no One-Size-Fits-All model in time series forecasting

    Comparing the knowledge quality in rough classifier and decision tree classifier

    Get PDF
    This paper presents a comparative study of two rule based classifier; rough set (Rc) and decision tree (DTc).Both techniques apply different approach to perform classification but produce same structure of output with comparable result. Theoretically, different classifiers will generate different sets of rules via knowledge even though they are implemented to the same classification problem.Hence, the aim of this paper is to investigate the quality of knowledge produced by Rc and DTc when similar problems are presented to them.In this case, four important performance metrics are used as comparison, the accuracy of classification, rules quantity, rules length and rules coverage.Five dataset from UCI Machine Learning are chosen and then mined using Rc toolkit namely ROSETTA while C4.5 algorithm in WEKA application is chosen as DTc rule generator. The experimental result shows that Rc and DTc own capability to generate quality knowledge since most of the results are comparable. Rc outperform as an accurate classifier, produce shorter and simpler rule with higher coverage. Meanwhile, DTc obviously generates fewer numbers of rules with significant difference

    Design and Development of Smart Home Security System for Disabled and Elderly People

    Get PDF
    This paper discusses an ongoing project that serves the needs of people with disabilities and the elderly at home. It uses the WiFi technology to establish communication between a user’s Smartphone and a controller board. The project uses a microcontroller to control the door lock and is equipped with a camera to identify the visitors. By connecting the servo motor and camera to the Raspberry Pi controller board, it can be controlled via WiFi to provide remote access from a smartphone

    The design of F-CMS: A flexible conference management system

    Get PDF
    Conference management system (CMS) is designed to help the conference committee manages a conference well.The CMS which is available in market nowadays provides a well managed pre-conference function such as paper reviewing, paper submission, and participant registration system.However, payment module is not given priority by the existing CMS. This study argues that the payment management is importance ant to simplify the payment process, avoiding the unpaid paper being published in the proceeding. Also the conference committee can easily calculate the conference profit when the event ends. However, CMS is inflexible handling certain cases such as in case authors are unable to pay the fee before the conference day but need to submit the camera ready.Hence, this paper attempts to explain the design of a flexible conference management system (f-CMS).f-CMS is developed using RAD approach. It also includes the registration module during conference day.This paper presents the review of literatures and the early stages of the development of f-CMS

    An evaluation of feature selection technique for dendrite cell algorithm

    Get PDF
    Dendrite cell algorithm needs appropriates feature to represents its specific input signals. Although there are many feature selection algorithms have been used in identifying appropriate features for dendrite cell signals, there are algorithms that never been investigated and limited work to compare performance among them. In this study, six feature selection algorithms namely Information Gain, Gain Ratio, Symmetrical Uncertainties, Chi Square, Support Vector Machine, and Rough Set with Genetic Algorithm Reduct are examined and their effectiveness to represent dendrite cell signal are evaluated. Eight universal datasets are chosen and assessing their performance according to sensitivity, specificity, and accuracy. From the experiment, the Rough Set Genetic Algorithm reduct is found to be the most effect feature selection for dendrite cell algorithm when it generates a consistent result for all evaluation metrics. In single evaluation metrics, the chi square technique has the best competence in term of sensitiveness while the rough set genetic algorithm reduct is good at specificity and accuracy. In the next step, further analysis will be conducted on complex dataset such as time series data set

    An adaptive anomaly threshold in artificial dendrite cell algorithm

    Get PDF
    The dendrite cell algorithm (DCA) relies on the multi-context antigen value (MCAV) to determine the abnormality of a record by comparing it with anomaly threshold.In practice, the threshold is pre-determined before mining based on previous information and the existing MCAV is inefficient when expose to extreme values.This causes the DCA fails to detect unlabeled data if the new pattern distinct from previous information and reduces the detection accuracy.This paper proposed an adaptive anomaly threshold for DCA using the statistical cumulative sum (CUSUM) with the aim to improve its detection capability.In the proposed approach, the MCAV were normalized with upper CUSUM and the new anomaly threshold was calculated during run time by considering the acceptance value and min MCAV.From the experiments towards 12 datasets, the new version of DCA generated a better detection result than its previous version in term of sensitivity, specificity, false detection rate, and accuracy
    • …
    corecore